Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 0381120200420020225
Genes and Genomics
2020 Volume.42 No. 2 p.225 ~ p.234
Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project
Seok Ho-Sik

Abstract
Background: One of the apparent characteristics of bioinformatics data is the combination of very large number of features and relatively small number of samples. The vast number of features makes intuitive understanding of a target domain difficult. Dimensionality reduction or manifold learning has potential to circumvent this obstacle, but restricted methods have been preferred.

Objective: The objective of this study is to observe the characteristics of various dimensionality reduction methods?locally linear embedding (LLE), multi-dimensional scaling (MDS), principal component analysis (PCA), spectral embedding (SE), and t-distributed Stochastic Neighbor Embedding (t-SNE)?on the RNA-Seq dataset from the genotype-tissue expression (GTEx) project.

Results: The characteristics of the dimensionality reduction methods are observed on the nine groups of three different tissues in the reduced space with dimensionality of two, three, and four. The visualization results report that each dimensionality reduction method produces a very distinct reduced space. The quantitative results are obtained as the performance of k-means clustering. Clustering in the reduced space from non-linear methods such as LLE, t-SNE and SE achieved better results than in the reduced space produced by linear methods like PCA and MDS.

Conclusions: The experimental results recommend the application of both linear and non-linear dimensionality reduction methods on the target data for grasping the underlying characteristics of the datasets intuitively.
KEYWORD
Dimensionality reduction, Manifold learning, Clustering, RNA-Seq, The genotype-tissue expression (GTEx) project
FullTexts / Linksout information
 
Listed journal information
SCI(E) ÇмúÁøÈïÀç´Ü(KCI)